Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug](memory) Fix exception-unsafe in aggregation node #28483

Merged
merged 4 commits into from
Dec 15, 2023

Conversation

xy720
Copy link
Member

@xy720 xy720 commented Dec 15, 2023

Proposed changes

Issue Number: close #xxx

The alloc function may throw std::bad_alloc exception when the process memory exceed limit.

be.INFO:

W1214 09:14:17.434849 771103 mem_tracker_limiter.cpp:204] Memory limit exceeded:<consuming tracker:<Load#Id=28448230da1f432e-8a66597e1032
9235>, process memory used 20.41 GB exceed limit 18.76 GB or sys mem available 9.04 GB less than low water mark 1.60 GB, failed alloc siz
e 1.86 MB>, executing msg:<execute:<>>. backend xx.x.x.xxx process memory used 20.41 GB, limit 18.76 GB. If query tracker exceed, set ex ec_mem_limit=8G to change limit, details see be.INFO.
Process Memory Summary:
    OS physical memory 31.26 GB. Process memory usage 20.41 GB, limit 18.76 GB, soft limit 16.88 GB. Sys available memory 9.04 GB, low wa
ter mark 1.60 GB, warning water mark 3.20 GB. Refresh interval memory growth 0 B
Alloc Stacktrace:
    @     0x555cd858bee9  doris::MemTrackerLimiter::print_log_usage()
    @     0x555cd859a384  doris::ThreadMemTrackerMgr::exceeded()
    @     0x555cd85a0ac4  malloc
    @     0x555cd8fcf368  Allocator<>::alloc()
    @     0x555cd8fdbdaf  doris::vectorized::Arena::add_chunk()
    @     0x555cd96dc0ab  doris::vectorized::AggregateDataContainer::_expand()
    @     0x555cd96aded8  (unknown)
    @     0x555cd969fa2c  doris::vectorized::AggregationNode::_pre_agg_with_serialized_key()
    @     0x555cd96d1d61  std::_Function_handler<>::_M_invoke()
    @     0x555cd967ab0b  doris::vectorized::AggregationNode::get_next()
    @     0x555cd81282a6  doris::ExecNode::get_next_after_projects()
    @     0x555cd8452968  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x555cd845553b  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x555cd8456a9e  doris::PlanFragmentExecutor::open()
    @     0x555cd842f200  doris::FragmentExecState::execute()
    @     0x555cd843280e  doris::FragmentMgr::_exec_actual()
    @     0x555cd8432d42  _ZNSt17_Function_handlerIFvvEZN5doris11FragmentMgr18exec_plan_fragmentERKNS1_23TExecPlanFragmentParamsESt8funct
ionIFvPNS1_20PlanFragmentExecutorEEEEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x555cd86ead05  doris::ThreadPool::dispatch_thread()
    @     0x555cd86e015f  doris::Thread::supervise_thread()
    @     0x7f3321593ea5  start_thread
    @     0x7f33218a69fd  __clone
    @              (nil)  (unknown)

Memory Tracker Summary:    MemTrackerLimiter Label=Load#Id=28448230da1f432e-8a66597e10329235, Type=load, Limit=8.00 GB(8589934592 B), Use
d=273.64 MB(286932828 B), Peak=273.98 MB(287286184 B)
    MemTracker Label=VDataStreamRecvr:28448230da1f432e-8a66597e1032923a, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=1.0
8 KB(1104 B), Peak=2.66 KB(2728 B)
    MemTracker Label=OlapTablePartitionParam, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=533.00 B(533 B), Peak=533.00 B
(533 B)
    MemTracker Label=OlapTableSink:0, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=23.89 KB(24464 B), Peak=23.89 KB(24464
 B)
    MemTracker Label=IndexChannel:indexID=13928, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=18.74 KB(19192 B), Peak=18.
74 KB(19192 B)
    MemTracker Label=NodeChannel:indexID=13928:threadId=139848762504960, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=216
.00 B(216 B), Peak=216.00 B(216 B)
    MemTracker Label=NodeChannel:indexID=13928:threadId=139848762504960, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=216
.00 B(216 B), Peak=216.00 B(216 B)
    MemTracker Label=NodeChannel:indexID=13928:threadId=139848762504960, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=216.00 B(216 B), Peak=216.00 B(216 B)
    MemTracker Label=VDataStreamRecvr:28448230da1f432e-8a66597e10329236, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=75.12 KB(76920 B), Peak=3.34 MB(3498744 B)
    MemTracker Label=AggregationNode:Data, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=699.98 KB(716776 B), Peak=699.98 KB(716776 B)
    MemTracker Label=VDataStreamSender:28448230da1f432e-8a66597e10329236, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=3.44 KB(3520 B), Peak=588.59 KB(602720 B)
    MemTracker Label=VDataStreamRecvr:28448230da1f432e-8a66597e10329239, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=5.46 KB(5592 B), Peak=1.57 MB(1645560 B)
    MemTracker Label=AggregationNode:Data, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=20.00 MB(20975576 B), Peak=20.00 MB(20975576 B)
    MemTracker Label=VDataStreamRecvr:28448230da1f432e-8a66597e10329238, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=223.35 KB(228712 B), Peak=6.88 MB(7219304 B)
    MemTracker Label=AggregationNode:Data, Parent Label=Load#Id=28448230da1f432e-8a66597e10329235, Used=1.25 MB(1314776 B), Peak=1.25 MB(1314776 B)
    
W1214 09:14:23.514665 771103 fragment_mgr.cpp:118] query_id=28448230da1f432e-8a66597e10329235, instance_id=28448230da1f432e-8a66597e10329236 
meet error status [MEM_LIMIT_EXCEEDED]PreCatch std::bad_alloc, Memory limit exceeded:<consuming tracker:<Load#Id=28448230da1f432e-8a66597e10329235>, 
process memory used 20.41 GB exceed limit 18.76 GB or sys mem available 9.04 GB less than low water mark 1.60 GB, failed alloc size 1.86 MB>, 
executing msg:<execute:<>>. backend xx.x.x.xxx process memory used 20.41 GB, limit 18.76 GB. If query tracker exceed, `set exec_mem_limit=8G` 
to change limit, details see be.INFO.

coredump:

*** SIGSEGV unkown detail explain (@0x0) received by PID 125689 (TID 0x7f4cb41f1700) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /data/doris-1.x/be/src/common/signal_handler.h:420
 1# os::Linux::chained_handler(int, siginfo*, void*) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo*, void*) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 4# 0x00007F567BA2C400 in /lib64/libc.so.6
 5# doris::vectorized::IAggregateFunctionDataHelper<doris::vectorized::AggregateFunctionMaxData<doris::vectorized::SingleValueDataString>, doris::vectorized::AggregateFunctionsSingleValue<doris::vectorized::AggregateFunctionMaxData<doris::vectorized::SingleValueDataString>, false> >::destroy(char*) const at /data/doris-1.x/be/src/vec/aggregate_functions/aggregate_function.h:389
 6# _ZNSt8__detail9__variant17__gen_vtable_implINS0_12_Multi_arrayIPFNS0_21__deduce_visit_resultIvEEOZN5doris10vectorized15AggregationNode26_close_with_serialized_keyEvEUlOT_E_RSt7variantIJNS6_27AggregationMethodSerializedI9PHHashMapI9StringRefPc11DefaultHashISF_vELb0EEEENS6_26AggregationMethodOneNumberIh12FixedHashMapIhSG_28FixedHashMapImplicitZeroCellIhSG_16HashTableNoStateE28FixedHashTableCalculatedSizeISP_E9AllocatorILb1ELb1EEELb0EEENSL_ItSM_ItSG_SN_ItSG_SO_E24FixedHashTableStoredSizeISW_EST_ELb0EEENSL_IjSE_IjSG_9HashCRC32IjELb0EELb0EEENSL_ImSE_ImSG_S11_ImELb0EELb0EEENS6_30AggregationMethodStringNoCacheI13StringHashMapISG_ST_EEENSL_INS6_7UInt128ESE_IS1C_SG_S11_IS1C_ELb0EELb0EEENSL_IjSE_IjSG_14HashMixWrapperIjS12_ELb0EELb0EEENSL_ImSE_ImSG_S1G_ImS15_ELb0EELb0EEENSL_IS1C_SE_IS1C_SG_S1G_IS1C_S1D_ELb0EELb0EEENS6_37AggregationMethodSingleNullableColumnINSL_IhNS6_26AggregationDataWithNullKeyISU_EELb0EEEEENS1Q_INSL_ItNS1R_ISZ_EELb0EEEEENS1Q_INSL_IjNS1R_IS13_EELb0EEEEENS1Q_INSL_ImNS1R_IS16_EELb0EEEEENS1Q_INSL_IjNS1R_IS1I_EELb0EEEEENS1Q_INSL_ImNS1R_IS1L_EELb0EEEEENS1Q_INSL_IS1C_NS1R_IS1E_EELb0EEEEENS1Q_INSL_IS1C_NS1R_IS1O_EELb0EEEEENS1Q_INS18_INS1R_IS1A_EEEEEENS6_26AggregationMethodKeysFixedIS16_Lb0EEENS2J_IS16_Lb1EEENS2J_IS1E_Lb0EEENS2J_IS1E_Lb1EEENS2J_ISE_INS6_7UInt256ESG_S11_IS2O_ELb0EELb0EEENS2J_IS2Q_Lb1EEENS2J_IS1L_Lb0EEENS2J_IS1L_Lb1EEENS2J_IS1O_Lb0EEENS2J_IS1O_Lb1EEENS2J_ISE_IS2O_SG_S1G_IS2O_S2P_ELb0EELb0EEENS2J_IS2Y_Lb1EEEEEEJEEESt16integer_sequenceImJLm0EEEE14__visit_invokeESB_S32_ at /var/local/ldb-toolchain/include/c++/11/variant:1013
 7# doris::vectorized::AggregationNode::_close_with_serialized_key() at /data/doris-1.x/be/src/vec/exec/vaggregation_node.cpp:1352
 8# doris::vectorized::AggregationNode::close(doris::RuntimeState*) at /data/doris-1.x/be/src/vec/exec/vaggregation_node.cpp:535
 9# doris::PlanFragmentExecutor::close() at /data/doris-1.x/be/src/runtime/plan_fragment_executor.cpp:687
10# doris::FragmentExecState::execute() at /data/doris-1.x/be/src/runtime/fragment_mgr.cpp:269
11# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) at /data/doris-1.x/be/src/runtime/fragment_mgr.cpp:509
12# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::PlanFragmentExecutor*)>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:291
13# doris::ThreadPool::dispatch_thread() at /data/doris-1.x/be/src/util/threadpool.cpp:543
14# doris::Thread::supervise_thread(void*) at /data/doris-1.x/be/src/util/thread.cpp:455
15# start_thread in /lib64/libpthread.so.0
16# __clone in /lib64/libc.so.6

check owner for /data/cdw/doris/be/pid ...
check owner for /data/cdw/doris/be/var/pull_load ...
check owner for /data/cdw/doris/be/lib/small_file/ ...
check owner for /data/cdw/doris/be/storage ...
start time: 2023年 12月 03日 星期日 21:15:15 CST

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@xy720
Copy link
Member Author

xy720 commented Dec 15, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Member

@mrhhsg mrhhsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 15, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.56 seconds
stream load tsv: 582 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17222621461 Bytes

@yiguolei yiguolei added dev/2.0.4 usercase Important user case type label pick_pipelinex labels Dec 15, 2023
@yiguolei yiguolei merged commit fb925bd into apache:master Dec 15, 2023
29 of 32 checks passed
hello-stephen pushed a commit to hello-stephen/doris that referenced this pull request Dec 28, 2023
The alloc function may throw std::bad_alloc exception when the process memory exceed limit.

be.INFO:

W1214 09:14:17.434849 771103 mem_tracker_limiter.cpp:204] Memory limit exceeded:<consuming tracker:<Load#Id=28448230da1f432e-8a66597e1032
9235>, process memory used 20.41 GB exceed limit 18.76 GB or sys mem available 9.04 GB less than low water mark 1.60 GB, failed alloc siz
e 1.86 MB>, executing msg:<execute:<>>. backend xx.x.x.xxx process memory used 20.41 GB, limit 18.76 GB. If query tracker exceed, set ex ec_mem_limit=8G to change limit, details see be.INFO.
Process Memory Summary:
    OS physical memory 31.26 GB. Process memory usage 20.41 GB, limit 18.76 GB, soft limit 16.88 GB. Sys available memory 9.04 GB, low wa
ter mark 1.60 GB, warning water mark 3.20 GB. Refresh interval memory growth 0 B
Alloc Stacktrace:
    @     0x555cd858bee9  doris::MemTrackerLimiter::print_log_usage()
    @     0x555cd859a384  doris::ThreadMemTrackerMgr::exceeded()
    @     0x555cd85a0ac4  malloc
    @     0x555cd8fcf368  Allocator<>::alloc()
    @     0x555cd8fdbdaf  doris::vectorized::Arena::add_chunk()
    @     0x555cd96dc0ab  doris::vectorized::AggregateDataContainer::_expand()
    @     0x555cd96aded8  (unknown)
    @     0x555cd969fa2c  doris::vectorized::AggregationNode::_pre_agg_with_serialized_key()
    @     0x555cd96d1d61  std::_Function_handler<>::_M_invoke()
    @     0x555cd967ab0b  doris::vectorized::AggregationNode::get_next()
    @     0x555cd81282a6  doris::ExecNode::get_next_after_projects()
    @     0x555cd8452968  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x555cd845553b  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x555cd8456a9e  doris::PlanFragmentExecutor::open()
    @     0x555cd842f200  doris::FragmentExecState::execute()
    @     0x555cd843280e  doris::FragmentMgr::_exec_actual()
    @     0x555cd8432d42  _ZNSt17_Function_handlerIFvvEZN5doris11FragmentMgr18exec_plan_fragmentERKNS1_23TExecPlanFragmentParamsESt8funct
ionIFvPNS1_20PlanFragmentExecutorEEEEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x555cd86ead05  doris::ThreadPool::dispatch_thread()
    @     0x555cd86e015f  doris::Thread::supervise_thread()
    @     0x7f3321593ea5  start_thread
    @     0x7f33218a69fd  __clone
    @              (nil)  (unknown)
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
The alloc function may throw std::bad_alloc exception when the process memory exceed limit.

be.INFO:

W1214 09:14:17.434849 771103 mem_tracker_limiter.cpp:204] Memory limit exceeded:<consuming tracker:<Load#Id=28448230da1f432e-8a66597e1032
9235>, process memory used 20.41 GB exceed limit 18.76 GB or sys mem available 9.04 GB less than low water mark 1.60 GB, failed alloc siz
e 1.86 MB>, executing msg:<execute:<>>. backend xx.x.x.xxx process memory used 20.41 GB, limit 18.76 GB. If query tracker exceed, set ex ec_mem_limit=8G to change limit, details see be.INFO.
Process Memory Summary:
    OS physical memory 31.26 GB. Process memory usage 20.41 GB, limit 18.76 GB, soft limit 16.88 GB. Sys available memory 9.04 GB, low wa
ter mark 1.60 GB, warning water mark 3.20 GB. Refresh interval memory growth 0 B
Alloc Stacktrace:
    @     0x555cd858bee9  doris::MemTrackerLimiter::print_log_usage()
    @     0x555cd859a384  doris::ThreadMemTrackerMgr::exceeded()
    @     0x555cd85a0ac4  malloc
    @     0x555cd8fcf368  Allocator<>::alloc()
    @     0x555cd8fdbdaf  doris::vectorized::Arena::add_chunk()
    @     0x555cd96dc0ab  doris::vectorized::AggregateDataContainer::_expand()
    @     0x555cd96aded8  (unknown)
    @     0x555cd969fa2c  doris::vectorized::AggregationNode::_pre_agg_with_serialized_key()
    @     0x555cd96d1d61  std::_Function_handler<>::_M_invoke()
    @     0x555cd967ab0b  doris::vectorized::AggregationNode::get_next()
    @     0x555cd81282a6  doris::ExecNode::get_next_after_projects()
    @     0x555cd8452968  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x555cd845553b  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x555cd8456a9e  doris::PlanFragmentExecutor::open()
    @     0x555cd842f200  doris::FragmentExecState::execute()
    @     0x555cd843280e  doris::FragmentMgr::_exec_actual()
    @     0x555cd8432d42  _ZNSt17_Function_handlerIFvvEZN5doris11FragmentMgr18exec_plan_fragmentERKNS1_23TExecPlanFragmentParamsESt8funct
ionIFvPNS1_20PlanFragmentExecutorEEEEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x555cd86ead05  doris::ThreadPool::dispatch_thread()
    @     0x555cd86e015f  doris::Thread::supervise_thread()
    @     0x7f3321593ea5  start_thread
    @     0x7f33218a69fd  __clone
    @              (nil)  (unknown)
xy720 added a commit that referenced this pull request Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/1.2.8-merged dev/2.0.4-merged pick_pipelinex reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants